LS4003 R tutorial 3

T-tests in R

In this tutorial we’re going to use R to calculate our T-test results for the different examples given in the Statistics lecture.

See example below for the end point:

Make sure you’ve completed the tutorial 1 section on using R from excel before starting here.

Install and Set-up

A refresher for how to install and set up R and RStudio.

To get set up, follow the below steps. Click each step to see the instruction and the screenrecording.

  1. Type in AppsAnywhere to the windows bar. This will open in a web browser
  2. Type in RStudio in AppsAnywhere
  3. Click “Launch” and wait for it to install and open.

GIF of Opening RStudio

GIF of Opening RStudio
  1. Copy and paste the following into the Console on the left, and press enter.

setwd("O:/")

  1. Click the “More” cog and select “Go To Working Directory”

You should now be in your OneDrive. You should be able to recognise the files and folders listed, from what you have saved here in your other classes.

GIF of Opening OneDrive

GIF of Opening OneDrive

Occasionally there is an issue with how OneDrive is loaded on the University computer.

If you get the error message:

Error in setwd("O:/") : cannot change working directory

Then try the following. Replace the underscores with your K number.

setwd("C:/Users/K______/")

Click the “More” cog and select “Go To Working Directory”

Then find and click on the folder:

OneDrive Kingston University

Click the more cog and select Set As Working Directory

GIF of Opening OneDrive with common error

GIF of Opening OneDrive with common error

If you don’t already have a folder for LS4003 Statistics, then you can create one by clicking “New Folder” and entering a name.

If your new folder doesn’t appear, click the refresh button (to the right of the more cog).

Then:

  1. Click into your new folder

  2. Click the More Cog and select “Set As Working Directory”

GIF of Making a Folder

GIF of Making a Folder

Once you’re in your folder you can create and save an R file. This is where you put your code.

  1. Click on the Green Plus icon and select “R Script”
  2. In the top bar, click “File” and then “Save”
  3. Give your file a name (e.g. “R_tutorial_1” )

When you make any changes, you can save the file by going File -> Save.

You can also save by holding down Control and S at the same time.

GIF of Making an R File

GIF of Making an R File
Note

This will automatically add the “.R” extension so we know it’s an R file - R_tutorial_1.R

Warning

Make sure you can find your file in file explorer. Always back up your work such as saving in OneDrive or emailing to yourself so that you don’t lose your progress.

You’re now ready to run some R code!

  1. Copy and paste the following into your R file:
value <- "Hello World"
value
  1. Highlight both lines and click the “Run” icon (green arrow)

You should see a result in your console (bottom left panel) and your environment (top right panel)

You’re now ready to work through the worksheet! As you go, try and figure out what each bit of code is doing. What happens if you change something?

GIF of Running an R File

GIF of Running an R File

This is an online, cloud-based option. It’s a bit more limited than running on a university computer or your own computer, but the free option should be enough for this module.

Go to Posit Cloud and create a free account

Log in, then go to New Project -> New RStudio Project.

Make a new folder in the bottom right panel (by clicking the New Folder button) called “LS4003_Statistics”.

Click on this folder to enter it, and then click the More cog (bottom right panel) and select “Set as Working Directory”.

To run R on your own machine, you have to install R (the programming language) and RStudio (the development environment).

When installing, click the most appropriate option for your machine (Windows/Mac/Linux)

Install R

Install RStudio

Once you have installed both, open RStudio.

Navigate to your Documents folder in bottom right panel. (If you can’t find it, type in setwd("~/Documents") to the console on the bottom left, then click the More cog on the bottom right and select “Go to Working Directory”)

Create a new folder called LS4003_Statistics by clicking the New Folder button on the right hand side.

Click on your folder (LS4003_Statistics) to enter it.

Set that as your final working directory by clicking on the ‘More’ cog icon again and select “Set as Working Directory”.

Unpaired t-test

We’re going to be using the same examples as in Statistics lecture 3 for each of our tests.

Add the data into R

The first thing we need to do is copy our data in R. As these are fairly small datasets, we can manually type in our data using the c() function to create a vector of values.

Summarise the data

We can use the summary() function to see our mean and interquartile range for each set of values.

Calculate the unpaired t-test p-value

Using the function t.test(), we can add our two sets of values as arguments and calculate the p-value. This should match the result from excel.

That’s our t-test done already!

We didn’t have to use any options here as the default test is two tailed, unpaired, and assuming the variance is not equal. If you want to change any of the options, you can try:

  • alternative for two tailed or one tailed
    • “two.sided” for a two tailed test,
    • “greater” for one tailed where the mean of the first variable is greater than the mean of the second variable
    • “less” for one tailed where the mean of the first variable is greater than the mean of the second variable
    • unpaired_t_test_result <- t.test(NonDiabetic, Diabetic, alternative="greater")
  • paired for paired or unpaired
    • TRUE for paired
    • FALSE for unpaired
    • unpaired_t_test_result <- t.test(NonDiabetic, Diabetic, paired = TRUE)
  • mu mean, for comparison for a one-tailed test

Create a dataframe with the data

We can use the following code to create a dataframe with the above vectors.

We need a column that contains the correct group - “NonDiabetic” or “Diabetic” for each value. We can use the rep() function for this which replicates values.

  • c("NonDiabetic", "Diabetic") is a list of the items we want to repeat
  • each = 16 gives us a list of 16 of each. We could also put c(16,16).

Visualise the results as a boxplot

We can now create a boxplot in the same way we did in Tutorial 1. We can also add a t-test result using the function stat_compare_means() from the ggpubr package.

Dataframe to a list of values

We’ve just turned our two lists into one long dataframe, but what if we want to do the opposite?

If you want to extract the values for a particular group, we can use the following structure:

If you look in your environment, DiabeticExtracted should be identical to your Diabetic values.

Using this, if you are importing data from excel you can extract the values into groups to prepare for your t-test.

Paired t-test

Our paired t-test is very similar. First, let’s get copy data from our Basketball players example.

Summarise the data

We can use the summary() function to see our mean and interquartile range for each set of values.

Calculate the paired t-test p-value

Using the function t.test(), we can add our two sets of values as arguments and calculate the p-value. This should match the result from excel.

We need to specify the option paired = TRUE for it to be a paired t-test.

Create a dataframe with the data

We can use the following code to create a dataframe with the above vectors.

Fill in the gaps, and if you’re stuck have a look at how we did this for the glycemia dataset.

Visualise the results as a boxplot

We can do a boxplot and add our p-value in the same way as for the unpaired test by adding the option paired=TRUE.

One-tailed t-test

We can follow the same process for the one-tailed t-test. First, we need our list of grades from our example.

Summarise the data

We can use the summary() function to see our mean and interquartile range for each set of values.

Calculate the one tailed t-test p-value

Using the function t.test(), we can add our one set of values and the number we want to compare it to. In this example, we’re looking to see if our mean value is greater than 40.

We need to specify the option alternative = 'greater' to do a one-tailed test for if the mean of our Grades are significantly above our value for mu, which we have set to 40 (a pass).

Create a dataframe and boxplot

Because we only have one group of data values, we can create our dataframe and draw our boxplot very simply.

We don’t need to annotate our p-value on here - it wouldn’t make much sense to do so as our p-value was specifically assessing if the mean was higher than 40 which is hard to represent visually.

Anova

Our final example is using our one-way ANOVA. We’re using the example of salaries and degrees - note that there’s not the same number of values for each group.

Create a dataframe

To do an ANOVA, we need to organise our data into a dataframe.

Question

Why have we used c(9,7,9) here? What if you change these numbers around?

If you make a change, have a look at your Degrees_DataFrame values and see if they still match our original data.

Calculate the ANOVA p-value

We can calculate the ANOVA p-value by using the aov() function.

We use salary ~ group to assess salary as a function of the degree group.

We always put the response variable before the ~ and the explanation variable after. Another way of reading salary ~ group would be “Salary depends on degree group”

Pairwise t test results

Our ANOVA was significant - but now we need to know which pairs are significantly different.

We can use pairwise.t.test() to do a t-test for all pairs of groups in our dataset.

We use the options:

  • pool.sd = FALSE so that variance is calculated independantly for each group
  • p.adjust.method = "none" as this function by default uses a Holm-Bonferroni correction to minimize false positives which is beyond the scope of this course.

Visualise the results

We can create a boxplot and plot all of our pairwise comparison p values.

To do this, first we can create a list containing all the pairs we want to plot the p-value for. Remember you only need to do each pair once - if you have c("Economics", "History") you don’t also need c("History", "Economics").

Extension

If you finish the tutorial, go back to the Worksheet 1 with the Penguins dataset.

Is there anything in this dataset that would suit a t-test? Which tests would you use?

Go through and see if you can find any significant differences between the three penguin species.

Adelie Penguin

Adelie Penguin

Chinstrap Penguin

Chinstrap Penguin

Gentoo Penguin

Gentoo Penguin